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Abstract 

Heavy  tailed  distributions  enjoy  increased  popularity  and  become  more  readily 
applicable  as  the  arsenal  of  analytical  and  numerical  tools  grows.  They  play  key 
roles  in  modeling  approaches  in  networking,  finance,  hydrology  to  name  but  a 
few.  The  tail  parameter  a  is  of  central  importance  as  it  governs  both  the  existence 
of  moments  of  positive  order  and  the  thickness  of  the  tails  of  the  distribution. 

Some  of  the  best  known  tail  estimators  such  as  Koutrouvelis  and  Hill  are  either 
parametric  or  show  lack  in  robustness  or  accuracy.  This  paper  develops  a  shift 
and  scale  invariant,  non-parametric  estimator  for  both,  upper  and  lower  bounds  for 
orders  with  finite  moments.  The  estimator  builds  on  the  equivalence  between  tail 
behavior  and  the  regularity  of  the  characteristic  function  at  the  origin  and  achieves 
its  goal  by  deriving  a  simplified  wavelet  analysis  which  is  particularly  suited  to 
characteristic  functions. 
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1  Introduction 


Heavy  tailed  distributions  enjoy  increased  popularity  and  become  more  readily  appli¬ 
cable  as  the  arsenal  of  analytical  and  numerical  tools  grows.  They  play  key  roles  in 
modeling  approaches  in  networking,  finance,  hydrology  to  name  but  a  few  applica¬ 
tions.  Examples  of  interest  include  the  stable,  the  Pareto  and  certain  extreme  value 
distributions.  The  tail  parameter  a  is  of  importance  in  its  own  right  as  the  central 
parameter  for  several  of  the  mentioned  distributions. 

In  addition,  it  sets  the  upper  bound  for  the  orders  r  beyond  which  moments  IE[|X  |’'] 
do  not  exist.  Indeed,  recall  that  a  random  variable  X  is  called  heavy  tailed  with  tail 
parameter  a  if 

P[\X\>  x]=x-^L{x)  (1) 

where  L  is  a  slowly  varying  function,  i.e.,  L{tx) / L{x)  ^  1  as  a;  ^  oo  for  any  f  >  0. 
It  is  called  Pareto,  if  L{x)  is  a  constant  and  (1)  holds  for  all  |a;|  >6.  For  the  Pareto 
distribution,  it  is  clear  that  moments  are  finite  exactly  up  to  order  a,  a  fact  that  can  be 
generalized  using  standard  facts. 

The  issue  of  finiteness  of  moments  is  particularly  pressing  in  view  of  the  abundance 
and  usefulness  of  moment  estimators.  They  are  not  only  important  for  parameter  es¬ 
timation,  when  the  underlying  distribution  law  is  known,  but  also  for  data  fitting  and 
model  selection,  i.e.,  identifying  unknown  distributions  from  sample  data.  To  recall 
but  two  instances,  the  Kurtosis  statistic  hypothesis  test  resolves  Gaussianity  versus 
non-Gaussianity,  whereas  for  a  Poisson  random  variable  mean  and  variance  should  be 
equal.  In  addition,  many  applications  integrate  moment  estimates  as  a  crucial  ingredi¬ 
ent.  That  is  the  case  in  multifractal  analysis,  where  the  g— th  order  absolute  moments 
of  the  increments  (or  the  wavelet  coefficients)  of  a  process  hold  valuable  information 
on  the  local  behavior  of  its  paths. 

Pathologies  emerge  when  moments  are  infinite  or  not  defined,  such  as  for  the 
Cauchy  distribution  which  has  infinite  second  moment  and  undefined  mean.  As  in¬ 
finite  moments  may  degrade  the  performance  of  estimators  (possibly  introducing  some 
systematic  errors)  or  reduce  the  speed  of  convergence  to  limiting  laws,  special  atten¬ 
tion  must  be  dedicated  to  their  theoretical  existence.  We  refer  once  more  to  multifractal 
analysis  where  infinite  moments  may  indicate  phase  transitions  that  are  highly  infor- 
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mative  about  the  process  regularity. 

All  this  motivates  the  development  of  statistical  methods  to  determine  the  finiteness 
of  moments  of  a  distribution  given  finite  sample  data  [10],  more  precisely,  to  determine 
the  (positive  and  negative)  critical  order  A“,  A'*'  of  a  distribution,  by  which  we  mean 
here 

A+  =  sup{r  >  0  :  lE[|Ar|’']  <  00}  ^2^ 

A“  =  inf  {r  <  0  :  IE[|Ar|’’]  <  00}. 

Related  estimators  will  not  only  provide  useful  for  the  tail  parameter,  but  also  for  the 
analogous  parameter  governing  the  distribution  around  zero,  or  any  center  of  choice  /x 
after  a  translation  X  ^  X^.  To  this  end,  we  propose  an  approach  that  combines  two 
facts. 

First,  the  characteristic  function  (t>{u)  =  IE[exp(iMAr)],  being  the  Fourier  trans¬ 
form  of  the  distribution  of  X,  has  as  many  continuous  derivatives  at  zero  as  the  dis¬ 
tribution  has  finite  moments  of  positive  orders.  In  particular,  for  even  k  we  have 
())(^)(0)  =  i^lE[X^]  whenever  one  of  the  two  is  defined  [21].  A  crucial  ingredient 
to  our  methodology  is  a  more  general  relation  of  this  sort.  To  this  end,  we  resort  to  the 
concept  of  the  characteristic  exponent  p'*'  of  a  distribution  by  which  we  mean  the  (gen¬ 
eralized)  degree  of  Lipschitz  continuity  of  the  real  part  of  the  characteristic  function  at 
the  origin.  Provided  in  lies  in  (0,  2)  the  characteristic  exponent  can  be  written  simply 
as 

p'^  =  sup{r  >0:1  —  Re^(u)  =  0(if  )  as  u  ^  0+}  (3) 

It  follows  from  basic  known  facts  that 

A+  =  p+  (4) 

as  long  as  these  values  lie  between  0  and  2.  Estimation  of  the  critical  exponent  can 
then  be  achieved  via  the  regularity  of  (f>.  Replacing  the  random  variable  2f  by  l/Jf  we 
find  and  an  estimate  for  A“. 

Part  of  the  paper  will  address  the  extension  of  this  approach  for  orders  larger  than 
2  from  the  estimation  point  of  view;  the  mathematical  foundation  of  this  extensions 
is  developed  in  a  companion  paper  [24]  and  can  also  be  found  in  [9].  While  effective 
for  model  selection,  the  characteristic  exponent  provides,  however,  only  bounds  for  the 
critical  order  in  that  case:  p'*'  <  A"*"  <  [p+J  -f  1  (see  corollary  6). 
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Second,  having  reduced  the  problem  at  hand  to  an  estimation  of  local  regularity, 
it  proves  effective  to  leverage  the  power  of  the  wavelet  transform.  In  a  nutshell,  the 
decay  of  the  wavelet  coefficients  of  a  function  W[g\  {u,  s)  for  u  close  to  0  provides  quite 
precise  information  on  the  local  regularity  of  the  function  g  at  0.  As  will  be  established, 
this  wavelet  analysis  becomes  particularly  simple  for  a  characteristic  function  (j)  and 
requires  only  the  knowledge  of  W[(t)] (0,  s). 

In  summary,  the  recipe  of  our  estimator  for  is  dramatically  simple: 

•  From  the  sample  data  set  {Xn,  n  =  1, . . . ,  N}  compute  solely  the  wavelet  coef¬ 
ficients  at  zero  of  the  empirical  characteristic  function  $(t6)  =  N~^  exp{zMAr„}, 
i.e.,  W[^]  (0,  s);  as  it  appears,  this  amounts  to  computing  the  non-parametric  un¬ 
biased  kernel  estimator  W (s): 

_  1  ^ 

Wis)  = (5) 

'  fc=i 

where  the  kernel  'k  is  a  the  Fourier  transform  of  a  semi-definite  wavelet  (see 
text). 

•  The  estimators  of  the  two  characteristic  exponents,  i.e.,  p+  and  p~  are  obtained 
from  simple  linear  regressions  of  log  W (s)  against  log  s  within  some  predefined 
scale  intervals.  These  estimators  are  scale-invariant,  can  be  made  shift-invariant, 
and  are  asymptotically  un-biased. 

•  Since  wavelets  can  not  capture  regularities  higher  than  their  own  regularity  N^, 
the  procedure  should  be  repeated  with  wavelets  of  increasing  regularity  (reason¬ 
ably  up  to  Np  =  4). 

We  will  demonstrate  the  effectiveness  of  this  estimator  looking  at  symmetrical  sta¬ 
ble  distributions  in  comparison  with  well  established  estimators  such  as  Koutrouvelis’. 
Recall  that  stable  distributions  appear  as  limiting  distributions  of  properly  renormal¬ 
ized  sums  of  iid  random  variables  with  (possibly)  infinite  variance.  The  symmetrical 
stable  laws  are  defined  by 

4’x{u)  =  lE[exp(mAr)]  =  exp(— -I-  ipu)  (6) 

and  their  heavy  tail  parameter  is  known  to  be  equal  to  a  [25].  Combining  this  with  the 
fact  that  their  densities,  though  not  explicitly  known,  are  symmetrical  and  uni-modal. 


4 


they  possess  finite  absolute  moments  of  order  r  exactly  for  r  G  (—l,a).  On  the  other 
hand,  the  Taylor  expansion  of  exp(-)  implies  readily  that  Re^(M)  =  1  —  + 

O(u^),  which  verifies  (4). 

2  Background 

In  this  section  we  collect  well  known  facts  on  the  existence  of  moments  as  well  as  the 
wavelet  analysis  of  irregular  signals. 

2.1  Tail  Estimators 

Most  well-known  tests  for  the  existence  of  moments  emerge  as  by-products  of  tail  es¬ 
timators  and  appear  in  parameterized  settings.  For  instance,  Nolan  in  [20]  proposed  a 
maximum  likelihood  estimator  for  general  alpha-stable  laws  (including  Gaussian  and 
Cauchy)  based  on  a  large  sample  data  set.  As  no  closed  form  exists  for  these  dis¬ 
tribution  functions  (aside  from  some  particular  rational  values  of  the  parameters),  he 
proposes  an  efficient  numerical  resolution  of  the  maximum  likelihood  equation. 

Previously,  Koutrouvelis  [16]  and  Me  Culloch  [18,  19],  among  many  others,  have 
proposed  two  different  estimators  of  the  parameters  of  a— stable  laws,  based  either  on 
Pareto  approximation  for  a— stables  tails,  or  on  the  analytic  form  of  the  characteristic 
function. 

More  recently,  Bianchi  and  Meerschaert  [3]  proposed  a  quadratic  estimator  of  tail 
index  a,  based  on  the  asymptotic  of  the  sample  variance.  This  robust  estimator  has  the 
advantage  over  Hill  estimator  [11],  to  be  shift  and  scale  invariant,  and  also  to  perform 
well  in  situation  where  the  Hill  estimator  is  inefficient,  namely  for  stable  distributions 
with  1.5  <  a  <  2. 

Starting  from  a  closed  form  for  the  characteristic  function  (recall  (6))  or  in  some 
cases  a  numerical  approximation  of  the  density  function  all  these  methods  aim  at  find¬ 
ing  the  minimum  of  the  log-likelihood  function,  given  the  data.  As  a  result,  it  is  well 
known  that  these  approaches  are  optimal  in  the  sense  of  minimum  variance  and  achieve 
the  Cramer  Rao  bound  [6,  16,  20].  However,  being  parametric,  these  estimators  may 
perform  poorly  whenever  the  true  underlying  distributions  do  not  match  the  model. 

In  this  paper,  we  propose  a  non-parametric  estimation  procedure  with  convincing 
robustness  properties  for  the  characteristic  exponents  p'^  and  p~  which  do  not  rely 
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on  any  assumption  on  the  density  model.  In  particular,  not  even  the  semi-parametric 
assumption  of  heavy  tails  (1)  is  made  and  can  be  tested  via  this  approach.  The  resulting 
estimates  can  be  used  for  estimating  the  tail  parameter  and  the  body  parameter. 

Notably,  both  exponents  are  estimated  in  the  same  procedure.  Indeed,  the  problem 
of  existing  negative  moments  could  be  reformulated  with  a  simple  change  of  variables 
X  as  a  positive  moment  existence  problem.  Then,  we  could  apply  our  estimator 

to  instead  of  X  directly,  allowing  thus  for  determining  a  lower  negative  bound  for 
the  existence  of  |x|“’'dF(x),  r  >  0.  However,  as  we  will  demonstrate  both,  the 
positive  and  the  negative  characteristic  exponent  can  be  evaluated  at  once,  using  the 
same  procedure  applied  to  the  same  data  set  of  i.i.d.  samples  ..jv. 


2.2  Characteristic  Function  and  Moments 


Let  us  recall  a  well  known  relation  between  high  order  moments  of  a  distribution  func¬ 
tion  F{x)  of  a  random  variable  X  and  its  so-called  characteristic  function  which  is 
defined  as: 


fiu)  =  Ee™’' 


F^^dF{x). 


(7) 


Using  simple  duality  argument  between  time  and  frequency  (via  the  Fourier  transform 
in  (7)),  the  behavior  of  the  characteristic  function  at  the  origin  relates  to  the  tail  be¬ 
havior  of  the  distribution  F  for  large  |x|.  In  particular,  whenever  the  integer  p-th  order 
moment  of  F  exists,  the  p-th  derivative  of  <j)  at  the  origin  exits  as  well  and  they  simply 
relate  as  follows 


dP 

duP  ^  ^ 


u—0 


=  iPJE[XP]  =iP  j  xPdF{2 


(8) 


This  justifies  4>  to  be  also  referred  to  as  a  moment  generating  function. 

Conversely,  when  p  is  even,  existence  of  (jfp'>  (0)  implies  existence  of  E[XP].  No¬ 
tably,  pathologies  can  occur  when  p  is  odd.  As  the  example  of  ^(m)  =  ^^2  cosju/{j'^  logj) 

demonstrates  (cpre.  [15,  p.  411]),  the  existence  of  ^(^^(0)  does  not  necessarily  guar¬ 
antee  the  existence  of  E[X]. 

As  we  strive  towards  a  generalization  of  a  relation  between  moments  and  charac¬ 
teristic  function  to  non-integer  orders  r  >  0,  let  us  first  introduce  the  absolute  moments 
of  order  r  S  E: 

Mr  =  nm  =  J  krdu(x)  (9) 
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where  we  allow  the  value  oo.  Let  us  emphasize  that  1E[XP]  exists  if  and  only  if  A4p 
is  finite,  in  other  words,  if  and  only  if  both  IE[max(X,  0)^]  and  IE[max(— X,  0)^]  are 
finite. 

We  first  recall  the  definition  of  A+  in  (2)  and  note  a  simple  fact: 

Lemma  1  For  any  distribution  F  we  have 
=  sup{r  >  0  :  Mr  <  oo} 

=  supjr  >0:1  —  F{x)  +  F{—x)  =  0{x~‘^)  as  x  ^  oo}  (10) 

Note  that  a  priori  there  is  no  information  on  the  behavior  in  (10)  for  r  exactly  equal 
to  A+. 

Proof 

To  obtain  one  half  of  the  lemma  recall  the  Markov  inequality  which  states  that 

P  [|A:|  >  a]  <  a-’'lE|A:r,  Vr  >  0,  Va  >  0.  (11) 

Consequently,  1  —  F{a)  +  F{—a)  is  0(0“’’)  for  all  r  >  0  with  finite  Mr-  The  other 
half  of  the  lemma  follows  from  theorem  [15,  Th.  11.3.1]  which  states  that  1  —  F{a)  + 
F{—a)  =  implies  that  Mr'  is  finite  for  all  r'  <  r.  ^ 

Next,  we  apply  a  theorem'  due  to  Binmore  and  Stratton  [4,  15]  which  relates  the 
Lipschitz  regularity  of  (p  at  the  origin  to  the  tail  decay  of  F  for  orders  less  than  2. 
Recalling  the  definition  of  the  Lipschitz  exponent  of  (p  given  in  (3)  we  find: 

Corollary  2  If  either  M  or  p'^  is  known  to  be  strictly  less  than  2  then: 

A+=p+.  (12) 

With  this  in  mind,  we  present  wavelet  theory  in  the  next  section  with  particular  em¬ 
phasis  on  their  natural  abilities  to  detect  and  estimate  the  local  regularity  of  a  function. 

2.3  Wavelets  and  Local  Regularity 

A  wavelet  analysis  consists  in  a  linear  decomposition  of  a  signal  g  onto  a  set  of  analyz¬ 
ing  functions^ 

=  |srV((M  -  t)/s),  (t,  s)  G  3?  X  3?+}  (13) 

^Let  0  <  r  <  2.  Then,  1  —  Re  0('u)  =  Oiu^)  if  and  only  if  -P[|X|  >  x]  =  0{x~'^). 

^We  restrict  ourselves  to  the  case  of  real  continuous  wavelet  transforms,  even  though  all  theoretic  results 
we  present  here  transpose  directly  to  the  discrete  framework  of  real  orthogonal  wavelets. 
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through  the  inner  product 


W[g](t,s)  =  J  (14) 

Conceptually,  this  transform  can  be  viewed  as  a  partitioning  of  the  time-frequency 
space,  where  W[g\  {t,  s)  measures  the  correlation  between  g  and  each  elementary  atom 
V't.s-  All  of  these  time-frequency  cells  ^  are  time-shifted  and  scale  changed  ver¬ 
sions  of  a  unique  prototype  function  i/;.  Therefore,  for  the  time-frequency  tiling  to  be 
consistent,  the  mother  wavelet  must  be  localized  in  the  time  and  in  the  frequency  do¬ 
main  simultaneously.  Formally,  these  constraints  transpose  to  the  following:  We  call  a 
wavelet  ^  admissible  of  regularity  N^,  if  it  has  the  following  three  properties: 

•  <^1(1-^  |f|)-^’^-iforfc  =  0,...iVv„ 

•  f  df  =  0  for  fc  =  0, . . .  —  1,  and 

.  /“ =  /“  |vI/(-i.)|V^di.  =  1. 

Now,  because  equation  (14)  conveys  information  on  the  local  oscillatory  behavior 
of  the  analyzed  function  g,  it  is  possible  to  assess  the  local  Lipschitz  exponent  of  g 
from  the  dynamic  of  wavelet  coefficients  across  scales.  A  simple  fact  reads  as  follows 
(see  [12,  14],  also  [23]): 

Theorem  3  Consider  an  admissible  wavelet  f  of  regularity  >  r.  Assume  that 
g{u)  —  p(0)  =  0{u'^)  as  u  —>  0.  Then,  there  is  a  constant  C  such  that 

|fF[(7](0,  s)|  <  C's’’  ass^0+.  (15) 

Reciprocally,  somewhat  more  precise  knowledge^  as  u  ^  0  of  the  decay  of  FF[p]  {u,  s) 
for  all  u  allows  to  determine  the  local  continuity  of  the  function  g  [14,  13].  As  we  will 
elaborate,  a  certain  type  of  wavelet  analysis  of  the  Lipschitz  continuity  of  characteris¬ 
tic  functions  simplifies  dramatically  due  to  the  fact  that  the  wavelet  transform  is  in  this 
particular  case  maximal  at  the  origin. 

^For  a  simplified  version  consider  0  <  r  <  1.  The  following  condition  implies  that  g{u)  —  g{0)  = 
0{u^)  [12,  14,  13]:  there  exist  numbers  C  and  q  <  r  such  that 


for  s  ^  0+. 


(16) 


3  Wavelet  analysis  of  Characteristic  Functions 


We  start  by  demonstrating  how  the  wavelet  analysis  of  characteristic  functions  can  be 
simplified  tremendously. 


3.1  Semi-definite  Wavelets 


As  it  turns  out  it  is  particularly  simple  to  estimate  the  wavelet  coefficients  of  a  char¬ 
acteristic  function  provided  the  wavelet  ip  is  semi-definite  by  which  we  mean  that  its 
Fourier  transform  =  /  ip{t)  exp(— is  real  and  does  not  change  sign.  In 

other  words,  ip  is  either  positive  semi-definite,  i.e.,  ^'(•)  >  0,  or  it  is  negative  semi- 
definite,  i.e.,  '!'(•)  <  0.  Examples  of  such  wavelets  are  the  derivatives  of  even  order  of 
the  Gaussian  kernel:  set 

A 

exp(-CT2f^)  (17) 

where  Cp  is  a  normalization  constant  and  p  is  a  positive  integer.  One  finds  the  semi- 
definite  Fourier  transform 

=  Cp{-lYv'^P  (4^)  ■ 

Lemma  4  If  the  Fourier  transform  4^  of  the  wavelet  ip  is  real,  square  integrable  and 
semi-definite  then 

|lE[Re(/)](f,s)|  <  |VF[Re<^](0,s)|  (19) 


In  other  words,  for  fixed  scale  s  the  modulus  of  the  wavelet  transform  of  the  real  part 
of  a  characteristic  function  is  maximal  at  f  =  0  for  semi-definite  'k. 

Proof 

Recall  that  —  t)/s)  and  'I'(sa;)  exp{—itx)  form  Fourier  pairs,  as  well  as  (p 

and  F.  Applying  Parseval’s  identity  yields 

lF[Re(/)](f,  s)  =  Re  J \s\~^ip{{u  — t)/s)(p{u)du 

=  Re/»(,x)exp(-i,x)dF(x)  (20) 

Using  the  simple  estimate  |Rex|  <  \x\  as  well  as  the  fact  that  is  semi-definite  and 
does  not  change  its  sign  we  obtain 


W\Re(p]{t,  s) 


< 


/n.(»)exp(-((x)|dF(x) 


|4'(sx)|  dF(x) 
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(21) 


as  desired. 


/»(,x)dFW 


iy[Re(/)](0,s) 


As  a  corollary  from  (20)  we  note 


fF[Re0](O,s) 


J  'i’isx)dF{x)  =  1E[^'(sA:)] 


❖ 


(22) 


3.2  Critical  orders  smaller  than  2 

We  are  now  in  a  position  to  combine  the  above  results  into  the  anticipated  tight  con¬ 
nection  between  a  wavelet  analysis  and  the  critical  order  A'*'.  For  orders  larger  than  2 
the  connection  is  less  tight,  yet  still  useful  (see  Section  3.3). 


Theorem  5  Consider  an  admissible,  semi-definite  wavelet  of  regularity  N,,p  >  2. 
Then, 


A"*"  =  =  sup{r  >  0  :  |FF[Re(/()](0,  s)|  <  Cs^  for  s  ^  (23) 

provided  that  either  term  is  known  a  priori  to  be  strictly  less  than  2. 

^From  a  wavelet  point  of  view  we  can  not  stress  enough  that  the  above  result  owes 
its  simplicity  to  the  fact  that  the  wavelet  coefficients  of  Re  f  are  maximal  at  0.  Also, 
A+  =  p+  was  noted  earlier. 

Proof 

Due  to  lemma  4,  the  condition  ( 16)  of  footnote  3  follows  trivially  from  fF(0,  s)  <  C's’’ . 
The  extension  to  0  <  r  <  2  exploits  the  symmetry  of  Re  </>  to  conclude  that  the  best 
polynomial  approximation  of  Re  (j>  of  degree  1  is  still  constant  (for  a  full  argument  see 
the  companion  paper  [24]  or  [9]).  <)> 

3.3  Critical  orders  higher  than  2 

Attempting  to  extend  the  appealing  three-fold  connection  of  theorem  5  to  orders  higher 
than  2  we  face  two  hurdles,  one  surmountable  due  to  special  properties  of  the  charac¬ 
teristic  function,  the  other  more  profound. 
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For  a  better  understanding,  we  need  to  extend  the  concept  of  Lipschitz  continuity 
to  higher  orders.  To  this  end,  we  define  the  Taylor  rest-term  of  order  2p  at  zero  as: 

p  pk 

Q^pit)  =  Re m  -  1  -  E 

whenever  it  exists.  Thus,  the  general  definition  of  reads  then  as 

p'^  =  sup{r  >  0  :  Q2p{u)  =  0(if)  as  rt  ^  0+,  for  2p  <  r  <  2{p  -f  1)}  (25) 

Also,  we  require  a  more  general  version  of  corollary  2.  The  higher  order  exten¬ 
sion  of  Binmore  and  Stratton  [4]  is  found  in  Kawata  [15]  and  relates  the  finiteness  of 
moments,  i.e.,  the  value  of  A"*"  to  a  smoothness  condition  of  Q2p- 

The  first  hurdle  concerns  the  fact  the  wavelet  analysis  is  a  powerful  tool  for  assess¬ 
ing  the  local  degree  of  regularity,  but  does  in  general  not  allow  to  make  conclusions 
on  differentiability  of  the  analyzed  function.  To  make  the  point,  functions  which  be¬ 
have  at  zero  as  |  •  (modulo  a  polynomial)  but  have  only  one  derivative  are  easily 
constructed.  In  other  words,  the  corrective  polynomial  does  not  have  to  be  the  Taylor 
polynomial  as  in  (24).  This  difficulty  is  overcome  by  proving  existence  of  moments 
directly  via  monotone  convergence  from  the  decay  of  appropriate  wavelet  coefficients 
(see  the  companion  paper  [24]  or  [9]).  Finite  moments  imply  then  that  (p  was  indeed 
differentiable  and  that  wavelet  analysis  indeed  reflects  the  regularity  of  Q2p- 

The  second  hurdle  stems  from  the  fact  that  Kawata’s  smoothness  condition"*  (which 
allows  to  compute  A+)  is  in  terms  of  an  integral  and  weaker  than  the  Lipschitz  condition 
(25)  (which  is  the  one  resulting  from  wavelet  analysis).  However,  we  are  able  to  obtain 
bounds.  We  state  only  the  final  result  and  leave  mathematical  details  to  a  companion 
paper  [24]  (see  also  [9]). 

Corollary  6  In  general,  the  Lipschitz  regularity  of  a  characteristic  function  (25)  is 
related  to  the  critical  order  of  moments  (2)  via 

/9+  <  A+  <  [p+J  -f  1.  (27) 

Note  that  [p+J  -|-  1  is  the  smallest  integer  strictly  larger  than  p+. 

Assume  that  (0)  exists.  Then  Air  exists  if  and  only  if  [15] 

|Q2p(f)|dt  <  oo.  (26) 
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3.4  Negative  Critical  Orders 

We  are  now  interested  in  estimating  the  negative  critical  order  A~  defined  in  (2),  of 
a  random  variable  X  with  density  dFx{x).  Let  us  define  a  new  random  variable  Y 
using  the  one  to  one  mapping  from  IRtolR:  Y  =  g{X)  =  X~^.  Fixing  Y  =  y, 
equation  y  =  g{x)  has  only  one  root  x  =  y~^,  and  |p'(x)|  =  from  which  we 
deduce  the  distribution  of  Y ,  dFyiy)  =  y~^dFx{y~^)-  The  negative  qth  order  of  X 
simply  corresponds  to  the  positive  — qth  order  of  random  variable  Y : 

IE[|X|«]  =lE[|l/r|«]  =IE[|y|-«].  (28) 

Therefore,  to  estimate  A“(2f)  of  X,  we  can  directly  apply  general  results  obtained 
in  Section  3.3  for  positive  higher  orders,  to  get 

A-(X)  =  -A+(1/X)  (29) 

4  Estimation  procedure 

In  this  section,  we  elaborate  on  the  implementation  of  our  estimator  for  A+,  in  partic¬ 
ular  the  choice  of  wavelet  and  scales  to  consider,  its  bias,  robustness  and  optimality 
properties. 

4.1  Implementation 

Given  Xi,  {i  =  1, . . . ,  N)  a  set  of  N  observed  i.i.d.  samples  of  the  distribution  dF(a;), 
we  use  the  empirical  estimator  for  the  characteristic  function 

.  N 

^(u)  =  (j)N{u)  =  —  (30) 

k^l 

For  our  purpose,  we  need  to  evaluate  this  function  on  a  properly  sampled  interval 
Uj  =  j  ■  6u,  j  =  0, K  —  1,  that  we  will  make  more  precise  later. 

We  now  recall  some  convergence  properties  of  this  empirical  characteristic  func¬ 
tion  (see  [7,  8]  for  details),  justifying  its  use  in  the  rest  of  our  method.  First,  (j>N{u) 
converges  almost  surely  when  N  goes  to  infinity  towards  (j){u)  in  the  sense,  over 
some  finite  interval  T 

sup  I^Ar(u)  -  (()(m)|  ^  0.  (31) 

|u|<T 
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Second,  consider  the  random  process  YAr(u)  =  '/N{4>Niu)—4'{u)  )  and  let  Y{u)  = 
Y  {—u)  be  a  zero  mean  complex  Gaussian  process  with  covariance  structure  lEF  {u)Y  (v) 
(j){u  +  v)  —  4>{u)(j){v).  Then,  the  sequence  Yn{ui),  Yn{u2),  ■  ■  ■ ,  Y]\[{um)  converges  in 
distribution  to  Y  {m)  ,Y  {U2) ,  ■  ■  ■  ,Y  (um)- 

It  is  also  shown  in  [8]  that  Yn{u)  converges  weakly  towards  Y{u)  in  any  finite 
interval,  provided  that  IE|X|^+‘^  <  00. 

Next  to  consider  is  the  wavelet  decomposition  of  4)n{'u)  which  simplifies  to 


W[(l)N]{t,s) 


j  ipt,s{v)  4>N{v)dv 


=  / V'i.s(t^)exp{zXfcu}(iu 

k 

=  ^'^exp{iXkt}  I  'ilj{u)exp{iXkSu} 
k  '' 

=  ^'^'i’{s-Xk)exp{iXkt}. 
k 

Two-point  Estimation  Procedure 

(1)  Assuming  that  T'  is  real,  semi-definite  we  finally  arrive  at  the  surprisingly  sim¬ 
ple  estimator  for  the  maximal  wavelet  coefficient  of  Re  </>  of  scale  s,  which  is  the  main 
ingredient  in  our  method: 


N 

W{s)  =  lR[Re(/.w](0,s)  =  (l/iV)^T'(s-Xfc).  (32) 

fc=i 

(2)  Finally,  according  to  theorem  5  the  characteristic  exponent  is  estimated  from 
the  powerlaw  exponent  which  steers  the  decay  of  kF(s).  An  estimator  of  the  critical 
moment  order  results  from  either  corollary  2  or  corollary  6.  Taking  the  logarithmic  of 
this  powerlaw  model  yields  the  linear  trend 


log  W  (s)  «  p+  log  s  +  log  C,  (33) 

where  p+  is  simply  obtained  via  a  standard  (weighted)  linear  regression  procedure  of 
log  W (s)  against  s  restricted  to  some  scaling  interval  (smin,  Smax)  to  be  specified. 

Robustness  Since  we  assume  nothing  on  the  distribution  we  obtain  thus  a  non- 
parametric  estimator.  We  also  note  immediately,  that  the  estimation  can  be  made  shift 
invariant  by  subtracting  the  sample  average  from  the  data  X  and  that  it  is  scale  invari¬ 
ant. 
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Indeed,  consider  the  data  X'  =  aXi.  Then  W[X']{s)  =  W[X]{as).  Rewriting 
log(s)  as  log(as)  —  log(a)  we  find  that  the  regression  data  of  X'  and  X  differ  merely 
by  a  shift,  leading  to  the  same  estimated  least  square  slope. 

4.2  Statistics  of  the  estimator 

Let  us  study  the  bias  of  both,  the  simple  estimator  of  the  wavelet  coefficient  (32)  itself, 
as  well  as  the  derived  estimation  of  the  scaling  exponent  (33). 

Since  all  observations  are  drawn  from  the  same  distribution,  we  may  write 

1  ^  f 

IE[W"(s)]  =  •  ^)]  =  /  'i'{sx)dF{x).  (34) 

^  k=l 

This  shows  that  as  an  estimator  of  the  wavelet  coefficient  VL(0,  s)  itself,  W{s)  is  un¬ 
biased.  However,  as  we  will  show,  a  bias  is  introduced  as  we  estimate  the  powerlaw 
decay  of  1L(0,  s)  through  the  powerlaw  decay  of  W{s).  This  result  is  similar  to  the 
one  obtained  in  [1]  where  it  is  shown  that  using  log-periodograms  (Welch  estimator) 
to  analyze  processes  with  spectra  of  the  type  Txif)  ~  leads  to  a  systematic 

bias  on  the  estimate  of  a.  On  the  other  hand,  using  a  wavelet-based  spectral  analysis 
(the  frequency  marginal  of  a  wavelet  decomposition)  yields  an  asymptotically  unbiased 
estimator  for  exponent  a.  This  is  due  to  the  constant  relative  bandwidth  of  wavelets 
that  performs  a  logarithmic  tiling  of  the  time  axis.  The  resulting  time-band  analysis 
has  joint  time  and  frequency  resolutions  that  match  naturally  powerlaw  decays  as  in 
^xif),  or  in  our  case,  in  (j){u)  around  the  origin. 

Estimating  the  critical  order:  A  showcase  To  explore  the  properties  of  an  estimator 
of  the  characteristic  exponent  /?+  through  the  wavelet  coefficients  we  first  treat  a  simple 
case  where  we  assume  that 

•  the  distribution  is  Pareto,  i.e.,  F'{x)  =  px{x)  =  cqx~°‘~^  for  \x\  >  6  and 
vanishes  elsewhere,  with  cq  =  ad"; 

•  the  wavelet  is  bandlimited,  actually  require  that  =  0  for  \v\  <  v,  where 
^  >  0  is  some  constant. 

Such  wavelets  are  known  to  exist.  For  instance,  by  construction,  the  auto-correlation 
function  of  any  admissible  and  band-limited  wavelet  is  itself  a  symmetric  in  time,  band 
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limited  and  positive  definite  admissible  wavelet. 

Inserting  the  particular  form  of  px  (x / s)  into  the  bias  formula  (34)  we  can  extract 
the  scale  s  through  a  substitution.  Provided  s  is  small  enough,  i.e.,  s  <  y_/8,  the 
remaining  integral  is  independent  of  the  scale  due  to  the  band  limitation  of  'P.  This 
reads  as: 


lE[VP(s)] 


^03  OOO 

/  'i>{xs)cox~°'~^dx  =  s°‘ ■  ■i>{y)coy~°‘~^dy 

Js  JSs 

pOO 

s“  •  /  T'(t/)cot/"“"^dy  =  C'^,(a)  •  s“. 

J  V 


(35) 


Thus,  the  exact  powerlaw  of  the  density  translates  into  one  of  W{s)  thanks  to  the 
band  limitation  of  the  wavelet.  Apart  from  this  show-case,  approximatively  the  same 
decay  of  W{s)  can  be  observed  under  much  less  restrictive  assumptions,  as  we  are 
about  to  show. 


Estimating  the  critical  order  of  fat  tail  distributions  We  relax  the  above  assump¬ 
tions  to  the  following  scenario: 


We  consider  a  simple,  heavy  tailed  probability  density  function  which  is  sym¬ 
metrical,  constant  around  the  origin  and  which  follows  an  exact  powerlaw  in  the 
tails: 


F'{x)  =  px{x) 


Cl  if  |a;|  <  5, 

C2\x\~°‘~^  if  jxj  >  6, 


(36) 


where 


la  ,  6  a 

=  —  ■ -  and  C2  =  —  •  - ]- 

26  a  +  V  2  1-1-  a/v 


•  The  wavelet  is  sufficiently  regular: 


(37) 


Let  us  comment  on  this  choice.  Despite  its  special  form,  this  distribution  will  be 
sufficient  to  explore  general  fat  tail  distributions.  Clearly,  it  has  finite  moments  of  the 
orders  between  A“  =  —  1  and  A+  =  a.  Also,  v  =  Px{6)/px{^)  is  the  ratio  of  the  tail 
amplitude  to  the  constant  value  around  zero.  Clearly,  the  bound  (37)  is  restrictive  only 
at  small  v,  as  T'  is  integrable  and  must  decay  at  infinity. 
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To  show  that  JE[W (s)]  scales  as  s“,  we  need  to  generalize  (35)  and  split  the  integral 
of  (34)  into  two  parts,  |a;|  <  (5  and  |x|  >  6.  We  claim  that  the  hrst  part  is  of  the  order 
while  the  second  term  behaves  as  s“  plus  a  term  of  the  order  .  In  summary, 
the  wavelet  estimator  decays  indeed  as  with  an  error  term  in  the  order  ,  which 
may  introduce  a  bias  in  the  estimation  of  A+  =  a. 

Applying  (37)  we  hnd 

/6  p6 

'i'{sx)px{x)dx  <  Cl  ■  /  '^{x)dx  <  Cl  ■  26  ■  (38) 

-s  J-s 

as  claimed.  Next,  similarly  to  (35)  we  obtain 

pOO  i  pOO  pOO 

/  'i’ {sx)px ix)dx  =  -  '^{y)px{y/s)dy  =  3°^  ■  C2  'S{y)y~°'~^dy.  (39) 

To  the  contrary  of  (35)  this  integral  depends  on  s.  Thus,  we  write  it  as  The 

hrst  term  is  now  a  constant,  leading  to  the  announced  behavior  as  s“.  To  estimate  the 
second  term,  we  estimate  T*  in  a  way  similar  to  (38): 


/-S'?  (36)^’I’~ 

vl/(y)y-“-idy  <  4  /  y^^y-^-^dy  =  ^ 


(40) 


~  o: 

Collecting  (38)  -  (40)  we  hnd  that  IE[VC(s)]  =  As“  +  0{s^*).  Bounding  the  error 
relied  on  the  regularity  (37)  of  the  wavelet,  while  the  exact  scaling  derives  directly  from 
the  exact  powerlaw  (36)  of  the  distribution.  We  generalize  this  result  as  follows: 

Proposition  7  Assume  that  4*  is  positive  semi-definite.  Assume,  the  distribution  has  a 
density  px  which  can  be  bounded  as  follows: 


<  a|a;|  “  ^  for  |a;|  >  <5, 

Px{x)  <(  >  b\x\~°‘~'^  for  jxl  >  5, 

<  c  for  jccj  <  6. 


(41) 


Assume  that  the  regularity  of  the  wavelet  xp  is  larger  than  the  critical  order,  i.e.,  > 

a.  Then, 


a  •  s“  +  0{s^*)  >  lE[iy(s)]  >  6  •  s“  + 


N,i, 


(42) 


with  d/b  =  a/b. 


Proof 

Since  xp  has  vanishing  moments  we  know  that  (37)  holds.  Proceeding  as  before. 
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we  write 


]E[M^(s)]  =  /  'i!{sx)px{,x)dx  + s  'ii{y)pxiy/s)dy  (43) 

J  —5  J  |y|>s(5 

The  first  term  is  bounded  from  above  as  0{s^*  )  as  in  (38).  The  second  term  maybe 
framed  using  the  tail  bounds  on  px  as 


s°‘  ■  al  >  s 


-1 


>\y\>sS 


'^iy)pxiy/s)dy  >  s“  •  bl, 


(44) 


where 

I  = 


p  poo  psS 

/  4/(y)|2/|-“-idy=  /  4/(y)|y|-“-idy-  /  vI;(y)|y|-«-idy. 

J\y\'>s5  J —  oo  J —s5 


Here,  the  last  term  can  be  bounded  as  0{s^*  “)  as  in  (40).  It  combines  with  the  factor 
s“  of  (44)  to  a  0{s^'^).  So,  only  one  term  behaves  as  s“  and  we  find 


IE[W(s)]  =  As°‘  + 


where 


/OO  pOO 

'^{y)\y\~°'~^<iy>A>b-  /  'S{y)\y\~°'~^dy. 

-OO  J  — OO 

A  more  careful  computation  reveals  that 

§N^-a 


B  <  25^*+^  ■c-d^  +  2ady 


-  a) 


(45) 

(46) 

(47) 

❖ 


4.3  Numerical  robustness 

Provided  that  the  observations  Xk  (k  =  1, . . . ,  N)  are  un-correlated  one  finds  easily 

var  VP(s)  =  (l/iV)var  ('I'(sAr)) .  (48) 

Moreover,  under  the  assumptions  of  proposition  7  we  conclude  that  lET'(sAr)  ~  s“ 
and,  considering  as  a  wavelet,  lET'^(sAr)  ~  s“.  Thus, 

var(T'(sA:))  =  IET'2(sX)  -  (lET'(sX))^ 

~  s“iet'2(a:)  -  5^“  (]ET^(A:))^ 
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To  provide  a  more  rigorous  error  estimate  let  us  assume  that  we  consider  the  ele¬ 
mentary,  yet  admissible,  wavelet 


T'(x) 


A^,  xel^  =  [v^-  \N^  +  \N^ 

0,  otherwise. 


(49) 


This  somewhat  crude  boxcar  approximation  for  the  wavelet  becomes  reasonably 
accurate  for  the  derivatives  of  the  Gaussian  kernel  xpp  (17)  as  we  set  ji:. 

Indeed,  |T'p|  reaches  its  maximal  value  Cp(p(j^/7r^)^’ exp(— p)  at  this  v^.  Clearly,  the 
approximation  becomes  more  accurate  as  the  regularity  increases,  i.e.,  oo. 

For  the  box-car  wavelet  we  get 

_  ^2 

varkF(s)  =  (l/7V)var  (T'(sX))  =  ^  {px  [sX  G  I^]  -  p\  [sX  G  I^])  . 


Assuming  an  exact  powerlaw  for  the  tail  as  in  (36)  we  may  write,  provided  the  scale  is 
sufficiently  small,  i.e.,  s  <  -  y/N^/2)/d: 

Px  [sX  G  /^]  =  /  C2X~°‘~^dx  =  s“  •  C2  /  y~°'~^dy. 

Using  the  mean  value  theorem  we  may  rewrite  the  integral  by  where  y^ 

is  some  number  in  thus,  ~  Finally,  given  ip  has  unit  energy,  i.e.  A^p  = 


varVF(s) 


C2 


S°C2  \ 


(50) 


For  small  scales  (s  — >  0),  the  variance  behaves  like  varlU (s)  ~  O  (s“).  Figures  4.3(a)- 
(c)  show  empirical  variance  vartU(s)  varying  with  parameters  N,  s  and  Np,,  attesting 
the  good  agreement  between  experimental  and  theoretical  results. 

Let  us  now  consider  the  new  variable  logkF(s).  With  a  central  limit  theorem  ar¬ 
gument,  we  can  say  that  W (s)  is  asymptotically  normal  with  mean  <5^  «  As°‘  and 
variance  «  Cs°‘.  Then,  in  first  approximation,  using  a  result  on  functions  of 
asymptotically  Gaussian  variables  [21,  26],  we  conclude  that  logkF(s)  is  asymptot¬ 
ically  Gaussian  and 


IE  log  W (s)  «  log  lEkF (s)  «  log  A  +  a  ■  log(s) 
varlogkF(s)  «  |IEkF(s)|“^varlU(s)  «  B/A  ■  s““ 


(51) 
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(a)  Empirical  variance  varVE (s) 


(b)  Empirical  variance  varlE (s) 


■ 

- 

I 

logs 


Eigure  1:  Experimental  verification  of  expressions  (50)  and  (51).  (a)-(c):  Empirical 
estimates  of  varlE (s)  estimated  over  a  set  of  100  independent  realizations  of  a-stable 
processes  of  length  N.  Evolution  of  varVE (s)  is  plotted  as  a  function  of;  (a)  log  N 
(a  =  1.2,  =  A,  s  =  0.0087);  (b)  logs  {a  =  1.2,  =  4,  N  =  2^^);  (c) 

(a  =  1.2,  s  =  0.0087,  N  =  2^^).  (d)  Empirical  estimation  of  loglEVE(s)  versus 
logs.  The  error  bars  correspond  to  the  standard  deviation  of  log  VE(s).  The  dashed 
line  materializes  the  theoretical  law  loglEVE(s)  =  a  ■  log(s)  +  C'  (a  =  1.2,  =  4, 

N  =  2^4). 
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See  figure  4.3(d). 

To  summarize  the  above,  we  propose  to  estimate  the  characteristic  exponent  p'^ 
via  the  estimator  of  the  scaling  exponent  a  of  the  wavelet  coefficients  in  51.  For 
(asymptotically)  Gaussian  random  variables  such  as  51,  the  maximum  likelihood  es¬ 
timator  of  a  is  simply  obtained  from  a  linear  regression  of  logW (s)  against  logs,  as 
already  suggested  in  33.  Asymptotically,  the  resulting  estimate  converges  to  In 
practice  though,  the  finite  size  data  set  limits  the  regression  range  to  some  interval 
s  G  (smin  >  0;  Smax  <  oo).  The  important  issue  of  properly  choosing  this  scaling 
region  is  treated  in  the  next  section. 

4.4  Choice  of  the  scale  range 

We  have  defined  an  estimator  for  via  a  log-log  linear  fit.  While  in  theory  the  wavelet 
coefficients  should  decay  as  a  powerlaw  of  the  scale,  we  are  in  practice  faced  with  the 
fact  that  the  scaling  deviates  significantly  from  the  theoretical  ideal  case  for  both  large 
and  small  scales.  Here  we  discuss  the  reasons  for  this  deviation  and  explain  how  to 
choose  the  scaling  region. 

4.4.1  Lower  bound  of  scaling  region 

We  have  two  different  approaches  to  determine  a  lower  bound  for  the  scale  range  of 
the  linear  regression  log  W (s)  versus  log  s  in  (33). 

The  first  one  is  based  on  a  Shannon-like  theorem.  Our  estimator  estimates  the 
singularity  of  the  characteristic  function  at  the  origin.  In  practice,  we  use  the  em¬ 
pirical  estimator  for  the  characteristic  function,  i.e.,  (j){uk)  =  N~^  eyi\){iukXi). 
The  maximum  variation  of  (p  is  controlled  by  the  maximum  value  of  Xj.  There¬ 
fore  sampling  0  at  a  higher  rate  than  approximately  the  Nyquist  rate  with 

X  =  vadx.{Xj,  j  =  1,  •  •  •  N},  does  not  bring  any  finer  information  on  the  regularity 
of  0(u)  at  M  =  0.  On  the  contrary,  when  the  analyzing  scale  goes  below  the  minimum 
bound  Smin  =  the  measured  regularity  is  overestimated,  as  the  function  under 

analysis  reduces  to  the  sole  C°°  component  ex-p{iukX),  oversampled  at  the  vicinity  of 
the  origin.  Thus,  concordantly  with  theorem  5,  when  a  is  estimated  from  data  below 
this  minimum  scale  bound  it  reflects  the  regularity  of  the  analyzing  wavelet  rather 
than  the  targeted  regularity  of  the  characteristic  function. 
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The  second  approach  starts  from  the  expression  (32).  In  order  to  be  consistent,  we 
need  to  ensure  that  at  least  one  sample  Xj  falls  inside  the  equivalent  support  of  T'(sa:). 
For  small  s,  only  the  largest  values  of  Xj  are  retained  to  enter  the  sum  (32).  As  a 
result,  if  X  is  the  maximum  sample  of  the  series  Xj,  I'^/s  is  the  central  frequency  of 
the  wavelet  at  scale  s.  We  then  want  X  «  v^/s,  which  leads  to  Smin  ~  v^jX. 

In  summary,  both  arguments  above  lead  to  the  same  conclusion  that  the  lower  cut¬ 
off  scale  should  be  chosen  proportionally  to  XjX.  For  the  numerical  analysis  in  this 
paper  we  adopted; 

s„,in  =  IjX.  (52) 

Using  a  stable  law  with  index  of  stability  (or  shape  parameter)  a,  we  present  in 
figure  4.4.2  the  theoretical  lower  scale  bound  Smin  =  A  linear  regression 

of  log  W (s)  versus  log  s  for  s  >  Smin  yields  an  accurate  estimate  of  the  characteristic 
exponent  p~^  =  a.  Moreover,  on  this  same  plot,  we  verify  that  for  s  <  Smin,  the  wavelet 
estimator  W{s)  behaves  like  s^'^,  in  accordance  with  the  aforementioned  argument 
that  the  function  under  analysis  is  now  the  (7°°  exponential  exp(mAr). 

4.4.2  Upper  bound  and  negative  moments 

As  we  saw,  existence  of  moments  is  dictated  by  the  tail  decay  of  the  distribution  F{x) 
for  X  ^  oo.  For  instance,  it  is  shown  in  [25],  that  the  asymptotic  tail  behavior  of  a 
stable  law  is  Pareto  when  0  <  a  <  2.  Defining  when  exactly  this  asymptotic  behavior 
starts  seems  to  be  a  tough  problem  (see  [20]),  as  it  depends  heavily  on  the  parame¬ 
terization  that  is  used  to  model  the  distribution  (in  the  parametric  context).  We  just 
pretend  here,  that  the  upper  cutoff  scale  Smax  below  which  W{s)  behaves  like  s“  is 
also  determined  by  this  cutoff  value  of  X  separating  the  tail  behavior  (as  a  Pareto  law 
for  instance)  from  the  body  of  the  distribution.  We  illustrate  this  with  a  compound  dis¬ 
tribution,  made  out  of  a  uniform  distribution  for  |Ar|  <  6  and  a— stable  distribution  for 
iXj  >  6.  We  show  with  this  simple  example  (see  Figure  4.4.2),  that  the  upper  cutoff 
scale  is  of  order; 

Smax  =  (5-^  (53) 

where  S  marks  the  transition  from  body  to  tail  behavior  in  the  distribution.  In  practical 
applications  one  might  choose  S  from  prior  knowledge  (rendering  the  estimator  semi- 
parametric)  or  estimate  6  itself  from  the  scaling  plots  (see  Figure  4.4.2). 
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Beyond  this  upper  limit,  the  wavelet  estimator  logM^(s)  decays  with  slope  — 1. 
This  particular  value  of  the  slope  depends  only  upon  the  distribution  we  have  chosen 
for  the  body  of  our  compound  distribution.  In  our  example,  the  uniform  distribution 
has  negative  moments  only  forp  >  A“(X)  =  —1.  That  is  precisely  this  bound  that  is 
estimated  in  4.4.2,  when  s  >  Smax-  To  support  our  claim,  we  simply  follow  the  same 
lines  as  for  the  tail  estimator  (35) :  Given  Px{x)  ~  x  ^  0,  then 

p5  p6.s 

lElT^(s)  =  /  T'(sx)Px(a;)d2^  =  /  T'(a:)|s“^xP“^s“^dx, 

Jo  Jo 

and  recalling  that  T*  is  band-limited,  we  get : 

lEM^(s)  =  f  T'(x)|x|'’'“^dx,  Vs  s.t.  T  <  5s 

Jo 

The  same  value  for  A“(X)  would  have  been  estimated,  if  instead  of  X  directly 
we  had  analyzed  the  new  random  variable  Y  =  as  discussed  in  Section  3.4,  and 
estimated  A“''(y)  =  —  A“(2f)  from  the  tail  decay  of  the  transformed  distribution. 

This  observation  bears  a  convenient  consequence  as  far  as  negative  moments  are 
concerned;  We  can  fully  exploit  the  behavior  of  W (s)  for  s  >  s^ax^  leading  us  to  a 
simple  estimator  of  A~  in  (2).  To  illustrate  this,  we  now  choose  a  compound  process 
similar  to  before  but  replace  the  uniform  distribution  for  |2f|  <  5  with  a  Gamma  dis¬ 
tribution  of  parameter  0  <  7  <  1.  The  density  of  the  Gamma  distribution  behaves  as 
I  •  around  the  origin,  whence  negative  moments  exist  exactly  for  negative  orders 
p  >  X~  =  —7.  Therefore,  we  estimate  the  slope  of  log  W (s)  for  s  >  Smax  =  5“^  and 
compare  this  estimate  against  the  theoretical  value  A“  =  —7  (see  table  2). 

To  summarize,  given  K  i.i.d.  random  variables  {Xj,  j  =  1,  •  •  •  K},  the  wavelet 
estimator  (32)  behaves  like: 

•  IT(s)  ~  for  S  <  Smin  = 

•  IT(s)  ~  for  Sniin  <  s  <  Smax,  where  Smax  Corresponds  to  the  inverse  of 

the  cut-off  value  separating  the  tail  from  the  body  of  the  underlying  distribution, 

•  IT(s)  ~  S~P  ,  for  S  >  Smax- 

This  is  impressively  demonstrated  in  Figure  4.4.2.  In  fine,  both  and  p~  can  be 
deduced  from  a  linear  regression  of  log  W{s)  versus  log  s,  over  the  corresponding  scale 
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Figure  2:  Scaling  region  and  cutoff  scales:  Choosing  the  scale  too  small,  the  resolu¬ 
tion  is  fine  enough  for  the  wavelet  to  analyze  the  individual  exponentials  that  form  the  estimator 
<j).  According  to  section  4.4.1  the  wavelet  coefficients  decay  (at  least)  with  exponent  N-^.  Choos¬ 
ing  the  scale  too  large,  the  estimator  samples  the  body  instead  of  the  tail  of  the  distribution;  thus, 
the  wavelet  coefficients  adhere  to  a  powerlaw  with  exponent  A” . 

intervals.  As  elaborated  in  section  3  choosing  an  appropriate  wavelet  and  according  to 
corollary  6,  we  have  A’*'  =  p'^  and  A“  =  p~  whenever  these  numbers  are  smaller  than 
2;  in  general,  p+  <  A^  <  4-  1  and  similar  for  A~. 

4.5  Choice  of  the  wavelet 

The  theoretical  results  of  section  3  form  the  basis  of  our  estimator.  For  them  to  hold 
the  analyzing  wavelet  ^  is  required  to  have  a  semi-definite  Fourier  transform  as  well 
as  a  number  of  vanishing  moments  larger  than  T^Re  ^(O). 

In  practice,  we  suggest  to  start  with  a  low  regularity  wavelet  such  as  the  second 
derivative  of  the  gaussian  window  corresponding  to  Np  =  2.  If  the  slope  p+ 

obtained  from  the  linear  regression  of  logfF(s)  versus  logs  is  smaller  than  =  2, 
then  corollary  2  immediately  posits  that  the  positive  critical  order  A'*'  is  equal  to  p'*'. 
Now,  if  the  measured  slope  p+  equals  =  2,  we  need  to  verify  whether  the  regularity 
p+  is  actually  larger  than  two  or  not. 

To  this  end,  we  increase  the  number  of  vanishing  moments  =  p  of  ijjpi't), 
and  repeat  the  estimation  of  p+  for  increasing  integer  p  as  long  as  the  slope  p+  hits 
the  bound  N^.  Once  we  get  a  p+  <  N^,  we  should  recall  corollary  6  which  only 
guarantees  that  A’^  can  not  exceed  [p+J  +  1.  This  could  be  of  interest  in  itself  for 
model  verification^ . 

^In  [9]  and  [24],  we  elaborate  on  how  to  use  fractional  wavelets  to  get  a  more  accurate  estimate  for  A+. 


23 


Despite  all  this,  in  all  our  experiments,  we  observed  that  the  basic  estimate  ob¬ 
tained  with  >  HnecpiO)  already  accurately  estimates  A+  on  its  own.  In  particular, 
we  never  encountered  the  case  <  A’*'  <  +  1,  that  indeed,  would  necessitate 

the  more  refined  procedure  described  in  [9,  24],  to  identify  the  characteristic  exponent 
precisely. 

5  Applications 

Application  of  particular  interest  in  this  context  are  the  parameter  estimation  of  stable 
laws  as  well  as  the  estimation  of  the  multifractal  partition  function. 

5.1  Estimating  Stable  and  Gamma  Parameters 

To  set  notation  we  recall  some  classes  of  distributions  well  known  in  the  literature,  that 
we  will  use  to  illustrate  our  characteristic  exponent  estimator. 


Pareto. 


A  Pareto  density  px  is  a  simple  power  law  function  that  take  on  the  form 


Px{x)  = 


apPx  “  ^ 


if  X  >  /i, 
else. 


(54) 


with  a  the  shape  parameter,  and  p,  the  position  parameter.  A  random  variable  X  with 
Pareto  distribution,  has  positive  r— th  order  moments  existing  only  for  orders  r  <  a, 
while  all  negative  orders  moments  exist  as  X  >  p  >  0  almost  surely.  The  median  is 
and  if  a  >  1  then  the  mean  exists  and  equals  JEX  =  pa/ {a  —  1). 


Stable.  Stable  laws  form  a  class  of  heavy  tailed  distributions,  for  which  there  exists 
an  abundant  literature  (see  e.g.  [25]  for  a  detailed  introduction).  A  random  variable 
X  follows  a  stable  law  that  we  denote  SaS{a,  P,  p),  if  and  only  if  its  characteristic 
function  reads : 


lE[exp(iu2f)]  =  exp(— ct“|m|“(1  —  iPwa{t))  +  ipu),  (55) 

where  WQ(f)  =  tan(7rasgn(f)/2)  for  a  7^  1  and'u;i(f)  =  — (2/7r)sgn(f)  log|f|. 

Although  there  exists  no  closed  form  for  stable  distributions  except  for  a  handful 
of  special  cases,  stable  laws  have  a  tail  behavior  that  can  be  approximated  as  a  Pareto 
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distribution  (54).  Indeed,  [25,  Property  1.2.15]  reads  as:  If  X  ~  with 

0  <  a  <  2,  then 

lim  A“P[2s:  >  A]  = 

X^oo  2 

lim  A“P[X  < -A]  =  (56) 

A^oo  2 

where  I/Cq,  =  x~°‘  sin(x)da;  depends  only  on  a. 

The  index  a  is  sometimes  referred  to  as  the  characteristic  exponent  of  the  stable 
law,  and  for  our  purpose,  it  constitutes  the  most  important  parameter  since  absolute 
moments  of  order  r  are  finite  exactly  for  r  G  (— 1,  a)  (0  <  a  <  2).  For  a  =  2  we 
recover  the  special  case  of  Gaussian  distribution,  with  existing  moments  at  all  orders 
r  >  —  1.  The  parameter  cr  indicates  scale,  since  X  ~  SaS{(7,  P,  fx),  then  aX  ~ 
SaS{aa,  P,afi)  (a  >  0).  For  a  =  2  we  have  =  var(2f)/2  while  for  a  <  2  the 
second  moment  IE[2f^]  is  infinite  and  the  variance  is  not  defined.  The  parameter  /x 
defines  position  in  the  sense  that  if  ~  SaS{a,  P,  /x)  then  +  c  ~  SaS{a,  P,fi  +  c). 
Provided  that  a  >  1  we  may  be  even  more  explicit  and  identify  /x  as  the  expected  value: 
IE[Ai]  =  /X.  However,  in  the  case  a  <  1  the  mean  IE[2f]  is  not  even  defined;  as  the 
most  prominent  example  we  mention  the  Cauchy  distribution.  Finally,  the  parameter 
P  provides  a  measure  for  the  skew,  more  precisely,  X  is  symmetrical  if  and  only  if 
P  =  O',  moreover,  if  this  is  the  case  then  (55)  reduces  to  (6). 


Gamma.  The  last  case  we  will  comment  on  is  the  Gamma  distribution.  A  random 
variable  X  has  Gamma  distribution  if 


r  Ax'*"  ^  exp{— cx}  if  X  >  0 
[^  0  else. 


(57) 


In  the  above,  7  and  c  are  positive  numbers,  and  A  =  with  F  the  generalized 

factorial  function.  The  special  case  7  =  n/2,  c=l/2  with  n  an  integer,  corresponds 
to  the  Chi-square  density  with  n  degrees  of  freedom,  and  for  n  =  2  it  reduces  to  the 
usual  exponential  density.  As  far  as  moments  are  concerned,  thanks  to  the  dominant 
exponential  decay  in  (57),  all  positive  order  moments  exist,  and  in  particular  WiX  = 
7/c  and  WiX^  =  7(7  +  l)/c^.  The  negative  moments,  i.e., 

pOO 

Mr  =  /  Ax’’"'"’’'”^  exp{— cx}  dx,  r  <  0,  (58) 

Jo 
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Table  1 :  Estimation  of  the  characteristic  exponent  a  of  a  stable  law,  using  Koutrouvelis 
procedure,  McCulloch  procedure  and  our  wavelet  based  procedure,  using  N  =  2^^ 
i.i.d.  samples  of  a  stable  variable.  Scale,  position  and  skew  parameters  are  fixed  (a  =  1, 
/r  =  0,  /?  =  0),  and  a  varies  in  (0,2).  Empirical  means  and  standard  deviations  (in 
parenthesis)  on  the  estimates  are  based  upon  a  1000  realizations  set. 


a 

0.2 

0.6 

1 

1.4 

1.8 

A+ 

0.196 

(0.007) 

0.58 

(0.018) 

1.0 

(0.035) 

1.46 

(0.066) 

1.74 

(0.02) 

a  (Koutrouvelis) 

ND 

(ND) 

0.60 

(0.007) 

1.0 

(0.009) 

1.403 

(0.013) 

1.80 

(0.012) 

a  (McCulloch) 

0.59 

(0.0018) 

0.605 

(0.009) 

1.0 

(0.009) 

1.40 

(0.016) 

1.80 

(0.022) 

converge  only  for  r  >  -7- 

Eor  the  above  classes  of  distributions,  Pareto,  stable  and  Gamma,  there  exist  effi¬ 
cient  procedures  aimed  at  estimating  the  different  sets  of  parameters.  In  most  cases, 
these  estimators  are  parametric  estimators  and  they  turn  out  to  be  optimal  (in  the  sense 
of  maximum  likelihood)  whenever  the  specific  underlying  distribution  model  and  the 
analyzed  data  distribution  do  match.  Our  estimator  (32)  is  non-parametric,  and  it 
should  not  be  expected  to  outperform  a  parametric  estimator  on  the  distribution  it  is 
tailored  for.  This  is  for  instance  very  clear  on  the  experiments  depicted  in  Table  1. 
Considering  N  i.i.d.  samples  of  a  stable  variable  X  ~  SaS{a,  (3,  ^),  we  compare 
our  estimates  (33)  of  a  against  two  well-known  parametric  estimators  for  stable  laws  : 
Koutrouvelis  [16]  and  McCulloch  [18]  procedures. 

Superiority  of  parametric  estimators  in  this  appropriate  context  is  not  questionable. 
However,  in  most  real  world  applications,  the  true  density  underlying  the  data  to  be 
analyzed  is  rarely  known,  and  very  likely  blind  application  of  parametric  estimators 
will  produce  aberrant  results.  A  very  illustrative  example  is  proposed  in  Table  2.  We 
consider  a  Gamma  variable  X  with  shape  parameter  0  <  7  <  1,  and  form  the  new  vari¬ 
able  Y  =  X~^.  Erom  (58)  we  know  that  r— th  order  moments  of  Y  should  only  exist 
for  r  <  7.  If  we  now  compare  the  (empirical)  densities  derived  both  from  Y  and  from 
a  stable  variable  with  characteristic  exponent  a  =  7  and  skewness  parameter  /?  =  1 
(which  ensures  positivity  since  a  <  1)  it  is  quite  difficult  to  dissociate  them  (Eigure  3). 
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Figure  3;  Empirical  distributions  of  the  random  variables  V  =  X~^  and  Z,  where  X 
follows  a  Gamma  law  with  7  =  0.6,  and  Z  follows  a  stable  law  with  a  =  0.6  and 
/3  =  1.  For  both  cases,  A+  =  a  =  7.  Axis  are  in  logarithmic  scale. 

Yet,  applying  crudely  stable  law  designed  estimators,  like  Koutrouvelis  or  McCulloch, 
to  the  raw  data  Y,  yields  very  bad  estimates  a  =  7  =  —  A“.  In  contrast,  determin¬ 
ing  the  characteristic  exponent  A'''(Y)  =  —  A“(X)  from  our  wavelet-based  regression 
procedure  (as  described  in  Section  3.4),  provides  us  with  fairly  good  estimates  of  shape 
parameters  7  for  Gamma  distributions.  Hence,  because  our  non-parametric  estimator 
does  not  assume  any  a  priori  distribution  for  the  data,  it  compares  favorably  as  a  gen¬ 
eral  purpose  tool  to  parametric  estimators  (see  for  instance  the  Hill  estimator  and  its 
various  improvements  [11,  22,  17,  3],  tail  estimators  [5,  19],  and  the  comparative  study 
conducted  in  [2]). 

Discussion  and  Conclusions 

We  itemize  the  three  main  results  we  have  derived  in  this  paper. 

•  We  have  established  a  theoretical  connection  between  three  exponents  namely 
the  critical  exponent  A"*"  which  fixes  the  highest  order  of  existing  moments  for  a 
random  variable,  the  tail  parameter  of  its  probability  distribution  and  the  charac¬ 
teristic  exponent  which  captures  the  Lipschitz  regularity  of  the  characteristic 
function  at  origin. 
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Table  2:  Estimation  of  the  shape  exponent  7  from  a  Gamma  variable  X.  Koutrouvelis 
procedure,  McCulloch  procedure  and  our  wavelet  based  procedure  are  applied  to  the 
heavy  tail  transformed  variable  Y  =  X~^.  N  =  2^^  i.i.d.  samples  of  Gamma  variable 
where  used.  Parameters  c  =  1  is  fixed,  and  7  varies  in  (0, 1).  Empirical  means  and 
standard  deviations  (in  parenthesis)  on  the  estimates  are  based  upon  a  1000  realizations 
set. 


7 

0.2 

0.4 

0.6 

0.8 

-P 

0.204 

(0.007) 

0.395 

(0.008) 

0.589 

(0.015) 

0.793 

(0.03) 

a  (Koutrouvelis) 

ND 

(ND) 

0.433 

(0.006) 

0.56 

(0.007) 

0.67 

(0.009) 

a  (McCulloch) 

0.513 

(0.000) 

0.514 

(0.000) 

0.583 

(0.009) 

0.72 

(0.013) 

•  We  proposed  a  wavelet  based  estimator  of  A+  and  A“,  that  allows  for  an  ex¬ 
traordinarily  simple  implementation.  Moreover,  this  characteristic  exponent  es¬ 
timator  is  non-parametric  and  does  not  assume  any  a  priori  knowledge  on  the 
underlying  distribution,  not  even  Pareto. 

•  Prom  an  application  point  of  view,  this  estimator  shows  very  useful  at  character¬ 
izing  rare  events  (often  responsible  for  divergence  of  moments)  and  measuring 
power  law  decays  of  fat  tail  distributions.  We  also  mentioned  a  particularly  in¬ 
teresting  application  of  this  estimator  in  the  context  of  model  selection  in  multi- 
fractal  analysis. 
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